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Summary 

In  this  paper  we  consider  the  analysis  of  record  breaking  datasets,  where  only  observations 
that  exceed  (or  only  those  that  fall  below)  the  current  extreme  value  are  recorded.  Examples  of 
application  areas  leading  to  data  of  this  type  include  ii.dustrial  stress  testing,  meteorological  anal* 
ysis,  sporting  and  athletic  events,  and  oil  and  mining  surveys.  The  inherent  missing  data  structure 
present  in  such  problems  leads  to  Ukelihood  functions  that  contmn  possibly  high-dimensional  inte¬ 
grals,  thus  rendering  traditional  maximum  likelihood  methods  difficult  or  infeasible.  Fortunately, 
we  may  obtmn  arbitrarily  accurate  approximations  to  ^he  likelihood  function  by  iteratively  apply¬ 
ing  Monte  Carlo  integration  methods  (Geyer  and  Thoiapson,  1992).  Subiteration  using  the  Gibbs 
sampler  may  help  to  evaluate  any  multivariate  integrals  encountered  during  this  process.  This  ap¬ 
proach  enables  a  far  more  sophisticated  set  of  parametric  models  than  have  been  applied  previously 
in  record  breaking  contexts.  In  particular,  we  illustrate  the  methodology  for  a  wide  array  of  discrete 
and  continuous  distributional  settings,  and  for  observations  that  may  be  correlated  and  subject  to 
mean  shifts  over  time.  Related  issues  in  model  selection  and  prediction  are  also  addressed.  Finally, 
we  present  two  numercal  examples.  The  first  uses  a  generated  dataset  exhibiting  a  high  degree  of 
Autocorrelation,  while  the  second  involves  records  in  Olympic  high  jump  competition. 

Some  key  words:  Gibbs  sampler;  Missing  data;  Monte  Carlo  approximant. 


1  Introduction 

The  subject  of  this  paper  is  estimation  and  prediction  in  contexts  where  data  points  accumulate 
•equentially  over  time,  but  only  those  that  break  the  current  record  (i.e.,  represent  a  new  maximum 
or  minimum  value)  are  observed.  Data  of  this  type  arise  in  a  wide  variety  of  practical  situations. 
The  history  of  achievement  in  sporting  and  athletic  events  (such  as  the  times  required  to  run  one 
mile,  or  top  land  speeds)  is  often  recorded  only  in  record  breaking  format.  In  meteorology,  record 
high  temperatures  or  water  levels  are  sometimes  all  that  are  available  for  a  given  location.  In 
industnal  stress  testing,  a  manufacturer  will  typically  test  a  series  of  finished  products  only  up  to 
the  current  ot  .  ved  minimum  strength,  rather  than  simply  increasing  the  pressure  on  each  item 
until  it  breaks.  Data  of  this  type  are  closely  related  to  what  might  be  called  threshold  data,  where 
only  observations  that  exceed  (or  only  those  that  fall  below)  a  certain  level  are  observed.  Examples 
of  this  situation  include  studies  of  publication  bias,  drug  effectiveness,  and  patient  monitoring  in 
which  a  physician  only  sees  a  patient  when  the  latter  is  (or  believes  he  is)  ill  enough  to  justify  the 
visit. 

Datasets  of  this  general  class  can  take  many  different  forms,  each  requiring  its  own  probability 
model.  Record  breaking  opportunities  may  arise  in  a  systematic  way  (as  in  an  annual  auto  race)  or 
completely  at  random  (as  in  a  new  record  for  the  number  of  college  students  in  a  single  phone  booth). 
The  former  situation  seems  to  call  for  a  discrete  time  stochastic  model;  the  latter,  a  continuous 
time  model  (although  here  we  might  also  group  the  events  into  convenient  time  blocks  and  treat 
the  result  as  a  discrete  time  series).  Another  distinction  involves  whether  we  have  information 
concerning  the  failed  record  breaking  attempts  or  not.  In  some  discrete  time  settings,  we  will  have 
such  aimliary  information  (e.g.,  for  an  annual  race),  but  not  in  most  continuous  time  settings. 

Given  the  form  of  the  dataset  in  hand,  we  must  adopt  realistic  assumptions  concerning  the 
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nature  of  the  process  creating  the  record  breaking  attempts.  In  particular,  the  assumptions  of 
independence  and  a  constant  mean  for  the  sequence  giving  rise  to  the  records  may  or  may  not 
be  appropriate.  For  example,  independence  may  be  reasonable  for  the  sequence  leading  to  record 
values  in  destructive  stress  testing,  but  not  for  weekly  financial  records.  When  the  assumption 
of  independence  is  not  warranted,  we  may  wish  to  adopt  a  Markovian  dependence  structure,  but 
in  some  cases  even  this  will  be  too  restrictive.  Concerning  the  constant  mean  hypothesis,  records 
arising  from  an  experienced  player’s  scores  in  a  card  game  are  not  likely  to  shift  over  time,  but 
we  would  expect  such  a  shift  in  most  athletic  competitions  due  to  improvements  in  equipment, 
nutrition,  conditioning,  preparation,  and  heredity  (bearing  out  the  old  sports  adage,  “records  were 
made  to  be  broken”). 

While  there  is  a  large  amount  of  literature  on  the  probability  modeling  of  record  breaking 
data,  relatively  little  exists  on  the  problem  of  statistical  inference  in  these  contexts.  Click  (1978) 
contains  an  excellent  review  of  the  probabilistic  work  and  a  brief  discussion  of  tests  for  randomness 
in  record  breaking  sequences.  Ikyfos  and  Blackmore  (1985)  discuss  the  forecasting  of  future  record 
values  given  only  past  records,  but  only  in  the  case  of  an  independent  and  identically  distributed 
anderl3ring  sequence.  Samaniego  and  Whitaker  (1986)  focus  instead  on  the  problem  of  inference  on 
the  underlying  model  given  the  records,  but  again  consider  only  the  independent  and  identically 
distributed  case,  dealing  primarily  with  estimating  the  mean  of  a  single  exponential  pop.  ’ation. 
In  a  second  paper,  Samaniego  and  Whitaker  (1988)  adopt  the  same  framework  but  with  only  a 
nonparametric  distributional  specification.  Smith  (1988)  retains  the  independence  assumption  but 
drops  that  of  the  constant  mean,  entertaining  linear,  quadratic,  and  exponential  decay  models 
under  normal,  Gumbel,  and  generalized  extreme  value  errors. 

In  this  paper  we  offer  a  completely  general  approach  for  parametric  likelihood  inference  and 
prediction  in  record  breaking  contexts.  The  only  requirements  for  our  approach  to  be  applicable  are 


the  specification  of  the  joint  distribution  of  the  entire  data  sequence  and  the  index  set  of  the  record 
breaking  observations.  The  general  form  of  the  likelihood  to  be  maximized  in  record  breaking 
problems  is  laid  out  in  Section  2.  Section  3  introduces  our  Monte  Carlo  computational  approach, 
and  discusses  its  implementation  using  Markov  chain  sampling  methods.  Section  4  outlines  several 
specific  models  for  record  breaking  data,  in  order  to  give  the  reader  an  idea  of  the  generality  of  the 
methodology.  Section  5  gives  two  numerical  examples  illustrating  our  methodology,  the  first  using 
an  artificial  dataset  created  to  exhibit  high  serial  correlation,  and  the  second  comprising  actual 
records  in  Olympic  high  jump  competition. 

2  Record  breaking  likelihoods 

Conceptually,  record  breaking  sequences  arise  within  a  chronological  sequence  of  events.  For  a 

sequence  of  n  events  we  denote  the  occurence  times  by  {tiitj . tn}>  &nd  the  associated  set  of 

measurements  by  Y  =  (Yi,  Yj, . . . ,  Yn).  If  the  events  occur  regularly,  e.g.  daily  or  annually,  we  can 
replace  the  t’s  by  the  index  set  {1, 2, . . . ,  n).  As  noted  in  Section  1,  our  formulation  presumes  that 
within  this  sequence  we  see  only  the  record  breaking  Y’s,  but  as  a  result  we  also  know  that  no 
records  occurred  at  the  unseen  Y’s.  Let  1  =  si  <  Sa  <  * "  <  <  n  denote  the  subscripts  within 

{<1 ,  <3, . . . ,  tn}  at  which  records  occurred.  Thus  we  assume  a  total  of  r  records  within  the  sequence 
of  n  events,  whence  Yt^  denotes  the  record  value  associated  with  s,.  Without  loss  of  generality 
we  assume  that  larger  values  break  records,  i.e.,  <  Y^  <  •••  <  Hence  our  dataset  is 

{ Yi ,  sa ,  Y,, ,  33 ,...,  Sr ,  Y,, ,  no  records  after  t,, } . 

We  now  turn  to  the  likelihood  associated  with  this  data.  In  the  existing  literature  possibly 
inappropriate  simplification  with  regard  to  the  distribution  of  the  Yi's  has  been  made,  e.g.  that 
they  are  independent  and  perhaps  identically  distributed  as  well.  As  suggested  in  the  introduction, 
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we  suspect  that  in  many  cases  this  may  not  be  so,  and  hence  we  only  assume  that  Y  has  joint 
distribution  /(y;  0),  where  d  is  a  vector  of  uninown  parameters.  Therefore  the  required  likelihood 
can  be  denoted  as  Due  to  the  implicit  chronology  it  seems  natural  to 

calculate  this  likelihood  as 


fiVi ;  0)  pr(ja|yi ;  e)f{y^  |yi ,  sa;  0)  •  •  •  pr(3,  |yi ,  sa, . . . ,  y,,_j ;  0) 

x/(y.,|yi,Ja,...,s,;d)pr(no  records  after  t,Jyi,33,...,y*,;®) 


(1) 


To  obtmn  expression  (1)  we  need  to  compute  three  types  of  terms.  First  consider  the  term 

Define  the  event  4,-  =  <  . . Yfi+i-i  <  y«.}  with  ^4,  =  {Y^+i  < 

yj^f.il'n  <y*,}.  Then 


pr(y«j-t  <y,j  <  c,  i4i,>la, 

AuAi,.. 


Aa-i|y«{,t  —  1, . . .,  j  —  1;0) 


so  that,  assuming  the  derivative  exists, 

j\,y*i  lyi » •  •  •  I  *'1  — -  - j — j - 7 — I - : — 7 - : — r":« - 1 

for  y,^  >  y,^._j.  Similarly,  since  Sj  occurs  if  and  only  if  both  of  the  events  {Y,^  >  y,j}  ank  i4,_i 
occur,  the  conditional  probability  of  3j  is 


pr(i4i,  ^2i  •  •  •  >  •  •  •  j  J  "" 


pr(no  records  after  <,,|yi,3ai 


_  iw*(4i,  Aa, . . . ,  4f [y^;, t  —  1, , . . ,  rj 
V^{^A\f  A2\  •  •  • ,  ^r— i|y«{i  i  —  1|  •  •  •  j 


Lastly, 


Assembling  these  pieces  we  note  that  a  telescoping  of  terms  arises  as  we  calculate  the  product  in 
(1)  so  that  the  likelihood  simplifies  to 

A  moment’s  reflection  reveals  that  (2)  is,  in  fact,  obvious  and  might  have  been  written  down 
directly.  That  is,  if  we  let  U  =  V  =  Y\U,  and  we  write  /(yjtf)  =  /(u,v;0)  = 

/(u;®)/(v|u;0),  then  (2)  becomes 


(2) 


/(u;  0)  pr(y  €  B|u;  0)  =  /  /(u,  v;  0)dv  ,  (3) 

jb 

where  the  event  { V  6  B}  s  {Ai ,  As, . . . ,  Ar}. 

If  Y  is  a  Markov  sequence,  i  e.,  /(y;0)  =  /(Vi;®)  117=2  (2)  simplifies  to 


(4) 


In  Section  4  we  discuss,  in  some  detail,  several  spedfic  models  for  (2)  and  (4). 


3  Maximization  of  the  likelihood 


The  problem  we  face  can  be  viewed  as  one  of  missing  data.  Had  we  observed  the  v’s  we  would 
have  faced  a  standard  problem,  namely  maximization  of  /(u,v;9)  with  respect  to  0.  Instead,  we 
must  maximize  a  likelihood  of  the  form  (2).  To  describe  our  approach  for  obtaining  the  maximum 
likelihood  estimate  of  0,  it  will  be  easier  to  work  with  the  notaticnally  simpler  form  (3).  Given  the 
joint  density  /(u,  v;tf)  and  the  observations  u  =  Uo  we  need  to  calculate  (3)  as  a  function  of  0. 
Such  a  function  would  almost  never  be  available  explicitly  since  it  requires  an  (n  -  r)-dlmensional 
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integration  over  a  constrained  re^on.  In  general  n  —  r  will  be  large  and  such  integration  will  defy 
exact  or  approximate  calculation  unless  the  are  independent,  in  which  case  we  obtain  n  -  r 
one-dimensional  integrals. 

As  a  result,  we  are  drawn  to  Monte  Carlo  approaches  for  carrying  out  the  integration.  In  prin¬ 
ciple,  one  could  attempt  a  grid  search  for  the  maximizing  0,  performing  a  Monte  Carle  integration 
of  (3)  at  each  given  0.  If  the  dimension  of  0  is  at  all  large  such  searching  will  be  impractical;  even 
for  low-dimensional  0  the  method  we  now  propose  will  be  much  faster. 

Our  objective  is  to  create  a  Monte  Carlo  approximant  for  (3)  and  subsequently  maximize  the 
resulting  approximate  likelihood.  An  additional  iterati^'e  step  insures  that  the  likelihood  itself 
is  maximized.  We  obt<un  our  approximant  using  ideas  in  Geyer  and  Thompson  (1992)  and  the 
ossodated  discussion  by  Gelfand  (1992).  Observe  that  we  can  write 

/(uo,  v;  0)dy  =  I  jT  /(uo,  v;  0o)<iv  J  |  ^^^^7(v|uo;  »o)<iv  J  |  /(v|uo;  «o)dv|  . 

(5) 

Thus  if  Vj,  j  =  1, . . . ,  Af  are  drawn  from  ^(v|iio;  0o),  the  conditional  distribution  of  V  given  uo 
and  00  restricted  to  B,  a  Monte  Carlo  approximant  for  (5)  is  given  by 


1  ^  /(uo.Vj'.g) 
M  f(uo,yj;0o) ' 


Since  the  integral  in  this  expression  is  free  of  0,  an  approximate  maximum  likelihood  estimate  is 
obtained  by  maximizing  the  summation  in  (6)  with  respect  to  0.  If  /(u;  0)  is  available  explidtly  so 
that  /(v|u;5)  =  f(u,y;0)/f(u]  0)  is  as  well,  the  approximant  iu  (6)  can  be  written  equivalently 


as 


/(VjK’.g) 
/(v^luoiffo)  ■ 


(7) 
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Expression  (7)  has  computational  advantages  over  (6)  under,  for  example,  i  Markovian  assumption, 
and  is  used  in  the  examples  in  Section  S. 

A  natural  question  to  ask  is  how  to  draw  samples  from  p(v|uo;  0o)-  In  special  cases,  such  as  mul¬ 
tivariate  normal  models,  /(v|iio;0)  will  be  a  standard  distribution  so  that  g  could  be  sampled  by 
simple  rejection,  i.e.,  retaining  Vj  drawn  from  /(v|uo;  Bo)  if  and  only  if  it  belongs  to  B.  Such  sam¬ 
pling  will  generally  be  very  inefficient,  however.  An  attractive  alternative  for  general  /  is  Markov 
chain  Monte  Carlo  using  the  Gibbs  sampler  (see  e.g.  Gelfand  and  Smith,  1990).  Implementation  re¬ 
quires  sampling  from  the  complete  conditional  distributions  arising  from  v,  all  of  which  are  propor¬ 
tional  to  the  known  joint  density  /(u,  v;  tfo)-  In  particular  if  we  write  V  =  (l^,  then  we  need 
,to  sample  from  /(t>i|v(_i),  uq;  0o)  restricted  to  a  half  interval.  If  we  employ  a  Metropolis-within- 
Gibbs  algorithm  (Muller,  1992)  these  draws  can  be  made  from  truncated  standard  distributions. 
Such  draws  may  be  accompL’shed  using  a  method  suggested  in  Devroye  (1986,  p.  38). 

Geyer  and  Thompson  (1992)  observe  that  there  is  gain  in  iterating  the  approach.  More  precisely, 
starting  at  some  Bo  if  we  maximize  (6)  to  obtain  B,  then  we  can  set  Bi  =  B,  redo  the  maximization 
resulting  in  a  new  B,  set  B2  equal  to  this  new  value,  and  so  on.  The  objective  of  this  iteration  is  to 
insure  a  good  Monte  Carlo  approximant.  In  practice  a  few  iterations  obtain  Bi  in  the  vicinity  of  the 
true  d.  At  this  point,  one  final  iteration  with  M  very  large  will  produce  an  accurate  final  estimate. 
A  byproduct  of  this  approach  is  the  possibility  of  apprmcimating  the  asymptotic  covariance  of  the 
maximum  likelihood  estimator.  Tc  do  so  requires  calculation  (either  analytically  or  numerically)  of 
the  Hessian  matrix  from  (6)  or  (7)  at  6.  We  do  this  in  conjunction  with  our  examples  in  Section  5. 

We  note  that  theoretical  concerns  associated  with  maximum  likelihood  estimation,  regarding, 
e.g.,  existence,  uniqueness,  consistency,  and  asymptotic  normality,  have  not  been  addressed  herein. 
The  assumption  is  that  the  likelihood  under  consideration  is  reason:*bly  well  behaved.  Remedies 
for  poorly  behaved  likelihoods  are  well  discussed  in  the  literature  and  apply  here  as  well. 
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4  Specific  models 

4.1  Overview 

The  Monte  Carlo  approximant  approach  of  the  previous  section  demonstrates  that,  under  almost 
any  parametric  joint  density  for  Y,  maximum  likelihood  estimation  given  a  record  breaking  sequence 
can,  in  principle,  be  carried  out.  The  goal  of  this  section  is  to  explore  more  specific  models  that 
facilitate  calculations  and  are  motivated  by  the  chronological  nature  of  the  record  breaking  process. 
Rather  than  attempt  any  formalization  we  illustrate  with  three  examples.  The  first  two  are  fairly 
general,  while  the  third  assumes  a  Markov  Gaussian  model. 

4.2  Conditionally  independent  hierarchical  models 

Suppose  that  /(y ;  0)  arises  as  /(y;  0)  —  f  /(yja;  i^)  /(z;  where  0  =  (^,  t;).  We  assume 
that  /(z;  17)  is  a  proper  density  over  the  domain  .Z  of  z  so  that  / (y ;  0)  is  proper,  and  that  given  z,  the 
Yj’s  are  independent.  Distributional  classes  of  this  type  have  been  called  conditionally  independent 
hierarchical  models  (Kass  and  Steffey,  1989)  and  offer  a  rich  modeling  framework.  If  we  define 
ht(*i^)  =  E{Yi\z:rl))  then  E(Yi)  =  J5(fe,)  and  c{n>(li,l^)  =  cov{hi,hj).  Thus  appropriate  choice 
^  /(y l^i  V*)  aiid  /(z;  ri)  can  be  made  to  yield  desired  model  behavior. 

If  /(y  |z!  V*)  /(z;  17)  form  a  conjugate  pair,  marginalization  over  z  will  readily  provide  /(y;  9) 

and  we  may  proceed  as  in  Sec....n  3.  If  eT.plidt  marginalization  is  not  possible,  how  can  we  obtsdn 
a  Monte  Carlo  approximant  to  (3)?  We  can  write 

I  /(uo,v;»)dv 
JB 


9 


=  |jj^/(uo,v;«o)dv| 

>(/./.  /(v|uc,  z;  ^o)/(iioiz;  V'o)/(*;  »?o)<fzdv  J 


The  assumed  conditional  inr!(?pendence  insures  that  all  conditional  densities  in  (8)  can  be  written 
down  immediately.  Sup;  (v^,Zj),  j  =  are  drawn  from  the  density  proportional  to 

/(vluo,2;t^o)/(“ol*>^o)/(2;»?o)  restricted  to  S  x  S.  Then  an  approximant  to  (8)  is  given  by 


/  /(uo,v;0o)dv  X 

jb 


1  ^ 

^  /(«ro,  v;|z;;  ’ 


(9) 


which  is  a  minor  extension  oi  equation  (6).  The  required  sampling  of  v  and  z  over  B  X  Z  may 
be  carried  oat  by  extending  the  Gibbs  sampler  as  follows.  Given  z  draw  V  from  /(v|lIo,*;t^'o) 
restricted  to  B  using  the  complete  conditional  distributions  for  Vi,  which  are  free  of  Uc  and  V(_i} 
under  conditional  independence.  Given  v  draw  Z  using  the  complete  conditionals  for  the  Zi,  which 
are  proportional  to  the  nonnormalized  form  /(v|uo.2;  V'o)/(*>ol*;t^o)/(*i*7o)"  The  Zi  are  treated 
as  missing  data,  just  like  the  Vi. 


4.3  Moving  window  sum  processes 

A  natural  extension  of  the  case  of  independent  1^’s  is  to  envision  them  arising  as  observations 
from  a  moving  window  sum  of  independent  variables.  We  illustrate  for  a  window  of  size  two. 
Suppose  that  Zo,Zi,...,Zn  are  independent  with  Zi  'v  /i(  •  ; 6).  Let  Yi  =  Zi  +  Zi-\.  Then  the 
joint  distribution  of  Y  cau  be  straightforwardly  written  down.  Moreover  =  B{Z^  +  EiJZi^i) 
and  cov(Yi..i,Yi)  =  var(Zi-i),  >o  th&t  appropriate  choices  of  the  fi  will  provide  desired  model 
behavior. 

As  a  concrete  example,  suppose  a  new  species  is  introduced  into  an  area  and  thereafter  seasonal 
population  counts  are  of  interest  but,  in  fact,  over  the  course  of  say  n  seasons  only  the  record 
breaking  seasons  and  counts  are  recorded.  Typically  such  counts  are  modeled  as  Poisson  variates, 
but  here  it  might  be  inappropriate  to  assume  they  are  Independent.  Suppose  we  let  Zi  be  inde- 
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pendent  Poisson(Ai)  random  variables  for  i  =  0,1,..., n  and  set  =  Z,  +  Zj_i.  Then  clearly 
y;  ~  Poi3son(A,  +  Ai_i),  c<n;(yi,y,_i)  =  A,_i,  and  E{Y,\Yi.i)  -  A,  +  Aj_iy,_i/(A,_i  +  Ai_2),  linear 
in  Vi-i.  To  create  a  drift  in  E'Y^)  we  could  take  the  A,  to  be  specified  parametric  functions.  If  Ai 
is  linear  in  t,  we  have  E{Yi)  linear  in  t;  if  A,  is  exponential  in  t,  then  log£(y,)  is  linear  in  t. 

4.4  Markov  Gaussian  models 

A  Markov’an  assumption  for  a  record  breaking  sequence  seems  a  plausible  yet  relatively  sim¬ 
ple  extension  from  independence.  Recall  that  under  such  an  assumption  the  likelihood  simpli¬ 
fies  a  bit  as  in  expression  (4).  In  particular  suppose  that  all  marginal  and  conditional  densi¬ 
ties  from  f{y\0)  can  be  obtained  explicitly,  as  in  the  case  of  f{y,0)  multivariate  normal.  Let 

Wi  =  (y,i+i,...,y„+,-i),  i  =  1 . r-  1,  and  W,  =  {Y»,+i,.  ..,Yn).  Then  the  Wj  are  con- 

dltionally  independent  given  u  yielding  (4).  Since  pr(V  C  .0|uo,0)  can  be  written  as  a  product 
of  r  terms,  an  approximant  for  each  term  can  be  created  using  low-dimensional  Gibbs  samplers 
or  some  other  numerical  integration  technique,  rather  than  requiring  one  fully  (n  -  r)-dimensional 
sampler.  Expression  (7)  would  be  recast  as  a  product  of  sums  utilizing  the  conditional  distributiouh 
/(wduoi®),  t= 

Suppose,  in  fact,  that  events  occur  at  regular  intervals  and  the  process  is  first  order  Gaussian, 
i.e.  Yi  -  fti  =  p[Yi-\  -  Hi-i)  +  u  where  e*  ~  jV(0,a*)  and  m  =  E{Yi)  is  a  specified  parametric 
function.  As  in  the  previous  example  the  pi  can  reflect  drift;  for  instance,  pi  =  a  +  0i\i  used  in 
the  examples  of  Section  5.  Clearly  vor(yi)  =  and  cov^i^Yi^j)  =  a‘*pP  l{\-  p^).  Such  a 

model  extends  Smith  (1988)  and  yields  routine  distribution  theory.  Alternatively,  Markov  Gaussian 
models  may  be  created  through  the  inverse  covariance  matrix.  Whittaker  (1990,  Chapters  5  and 
6)  presents  a  very  readable  discussion  of  this  approach.  In  particular  he  shows  that  if  only  the 
diagonal  and  first  off-diagonal  terms  in  the  inverse  of  the  covariance  matrix  are  nonzero,  then  the 
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joint  distribution  is  a  Markov  Gaussian  model. 

A  generalization  in  the  spirit  of  Subsection  4.2  assumes  that  the  ^  are  random.  The  idea 
is  that  randomly  arising  larger  ^  will  encourage  the  breaking  of  records.  If  so  then  we  might 
write  the  model  in  hierarchical  form  as  »?)•  Note  however  that  we  do  not  have  a 

conditionally  independent  hierarchical  model.  An  illustration  would  be  the  first  order  dynamic,  or 
state-space,  model  (see  e.g.  Carlin,  Poison  and  Stolfer,  1992).  If  we  can  explicitly  marginalize  over 
ft  we  will  find  ourselves  with  a  Markov  Gaussian  model  again.  If  not,  we  can  create  an  approximant 
similar  to  (9). 

A  final  related  point  here  concerns  the  extension  of  such  models  to  a  continuous  time  process 
{y (t)  :  t  >  0} ,  where  events  occur  at  times  ti , . . . ,  tn  resulting  in  y (ti ),...,  y(t„).  Here  a  standard 
theorem  in  stochastic  processes  (see  e.g.  Breiman,  1986,  p.  289)  notes  that  a  Gaussian  stationary 
process  is  Markov  if  and  only  if  its  autocovariance  function  r(t)  is  of  the  form  ^7!*!,  0  <  7  <  1.  In 
other  words,  the  only  stationary  Markov  Gaussian  process  is  of  the  form  we  have  just  described. 

5  Numerical  examples 

5.1  Simulated  high  correlation  data 

To  illustrate  the  performance  of  the  methodology  for  record  breaking  data  exhibiting  high 
autocorrelation,  consider  the  n  =  SO  simulated  yi  values  given  in  Table  1.  These  data  were  generated 
according  to  the  linear  first  order  Gaussian  model  introduced  in  Subsection  4.4,  where  we  let  a  =  0, 
P  =  —  1.4,  p  =  0.8,  and  set  y\  =  1.  The  yi  values  are  displayed  graphically  as  dots  in  Figure 

1(a);  the  r  =  34  record  breaking  observations  are  boxed  for  easier  identification.  Fitting  a  simple 
linear  regression  model  to  the  full  50-point  dataset,  we  see  in  Figure  1(b)  the  wave  pattern  often 
present  in  the  residual  plot  for  serially  correlated  data,  and  a  clear  positive  trend  in  the  plot  of 
each  residual  versus  the  immediately  preceeding  residual  in  Figure  1(c).  Both  of  these  diagnostics 
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suggest  high,  positive  correlation  for  our  generated  dataset. 


i 

no  record 

Vi 

i 

no  record 

Vi 

t 
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Vi 
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no  record  yi 
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1.00 
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19.14 

27 
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31.08 

43 

43.73 

6 

*. 
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47.70 
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34.58 
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23.12 

33 

* 

33.08 

46 

49.37 
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* 

8.50 

21 

« 

24.95 

34 

30.19 

47 

52.01 

10 

9.37 

22 

^  * 

24.83 

35 

* 

31.40 

48 

53.25 

11 

10.95 

23 

25.93 

36 

« 

33.00 

49 

56.84 

12 

12.33 

24 

28.85 

37 

35.72 

50 

•  54.92 

25 

« 

28.50 

38 

35.73 

T^ble  1:  Simulated  data  having  p  =:  0.8 

The  magnitude  of  the  Pearson  correlation  present  in  the  lagged  residual  plot,  0.83,  indicates 
the  severity  of  the  autocorrelation,  but  its  value  is  a  bit  misleading  since  this  statistic  does  not 
actually  estimate  the  parameter  p  in  an  AR(1)  model  with  a  time  trend.  Instead,  we  might  look 
at  the  differenced  series  Dt  =  Yi  —  Pt-i,  t  =  2,..., 50,  and  observe  that  var{Dt)  =  2(7’^/(l  +  p) 
and  cov{Dt,Dt-i)  =  -(<r*(l  -  p)]/(l  +  p),  »o  that  corr(i?j,i7t_i)  =  -(1  -  p)/2.  Hence  if  6  is 
the  sample  estimate  of  this  correlation,  we  obtain  the  crude  point  estimator  p  =  1  +  min(0, 2C). 
Figure  1(d)  plots  the  {Dt,  Dt-i)  pairs  for  our  dataset;  the  resulting  p  equals  0.901.  Thus  we  have 
substantial  evidence  that  a  model  incorporating  p  wiU  substantially  improve  our  inference. 

Of  course,  our  models  will  not  analyze  the  full  data  as  above,  but  only  the  68%  of  the  data  that 
constitute  record  breaking  observations.  Notice  that  there  are  three  gaps  of  length  four  (t  =  19-22, 
25-28,  and  33-36),  one  gap  of  length  two  (»  =  6-7),  and  two  gaps  of  length  one  (t  =  9  and  50). 
We  will  use  our  Monte  Carlo  algorithm  to  iind  the  maximum  likelihood  estimate  of  0  =  {a,0,  a,p) 
given  the  observed  data  uq  =  (yjj»...,y»,,).  We  will  compare  the  fit  of  this  model  to  that  of  a 
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reduced  model  where  we  ignore  autoconel  cation  by  insisting  that  p  =  Q. 

Using  notation  suggested  i:.  Subsection  4.4,  we  write  wi  =  {yetHTY,  wa  =  (y^9^y20.y2l^y2a)^ 
W3  =  (ya5,y26,ya7,yj8)',  and  W4  =  (y33,y34,y35,y36)',  so  that  V  =  (yg.yso.wi.wj.w^.wi)'.  Then 
the  likelihood  (4)  for  our  dataset  takes  the  form 

L(e-,  uo)  =  f{yi\e)  {nj=2  /(y*ily*;-.i :  ®)}  {i^  /(yglys,  yio;  5)dy8}  /(y5o(y49;  0)dy5o} 

X  { /^oo  /(wilys.  ys:  0)dwx}  ^  /(wjlyxs, yaa;  e)dv^2} 

X  {/ri  /l!?i  ir:,  iTci  /(W3|y34. yaa;  0)dw3}  {/T.^  /Ti  JTi  /(W4|y3a,  ysr;  0)dw,}  , 

Since  we  have  assiuned  a  first  order  linear  Gaussian  model,  the  distribution  for  each  observed  record 
(pven  ihe  one  immediately  preceding,  /(y»Jy,,_, ;  0),  is  readily  available  from  standard  multivariate 
normal  theory.  Similarly,  the  required  conditional  distributions  for  the  gaps  y9iyso>wi,W2,W3  and 
W4  are  also  available  as  normals,  complf'::ng  the  likelihood  specification.  Thus  a  Monte  Carlo 
approximant  of  the  form  in  (7)  with  the  Markovian  simplifications  discussed  in  Subsection  4.4  is 
convenient. 

To  carry  out  the  required  Yj  sampling,  we  first  note  that  y^j  and  y^Qj  values  may  be  generated 
directly  from  their  (suitably  truncated)  complete  conditional  distributions,  obtained  from  standard 
multivariate  normal  theory  as 

/(y9|y8,yio;e)aiyr(^,<7V(i  +  p*))/(-oo,v.)(yb).  “d 

/(ysolyss;  0)ocN  (pso,  <r%.co,vi,)(y5o), 

where  fig  =  a  +  9^  +  -  a  -  gfij  and  pso  =  a  +  500  +  p(y49  ~  o  -  490).  For  the 

missing  data  lying  in  gaps  of  length  greater  than  1,  however,  we  resort  to  Markov  chain  Monte 
Carlo  methods  to  obtain  the  necessary  samples.  For  each  such  missing  y,-,  the  complete  conditional 
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density  to  be  used  for  generation  of  the  y,ys  is  of  the  form 

where  m  =  a  +  0i  +  ^ItzX^SliiL  Vn-i  is  the  most  recent  record  value,  and  it 

is  understood  that  either  or  both  of  the  conditioning  values  yi_i  and  yi^i  may  themselves  be 
Monte  Carlo  samples  if  they  too  correspond  to  non-record  values.  In  our  implementation,  we  ran 
M  parallel  sampling  chains  for  N  "bum-in”  iterations  to  reach  the  chain’s  ergodic  distribution, 
retaining  only  the  value  from  each  chain.  While  somewhat  wasteful,  this  approach  was  an  easy 
way  to  oblain  independent  iterates  in  a  situation  where  the  required  generation  was  inexpensive. 
We  took  W  =  20,  a  conservative  bura-in  value  based  on  our  experience  with  normal  sampling 
models.  In  less  regular  modeling  scenarios,  a  monitoring  diagnostic  may  help  select  the  proper 
value  of  N.  Under  the  parallel  sampling  approach,  the  recent  papers  by  Gelman  and  Rubin  (1992), 
Bitter  and  'Tuiner  (1992),  and  Roberts  (1992)  are  particularly  helpful  in  this  regard. 

Given  the  sampled  values  {vj  =  (y|^-,  Wy,  wj^-,  wj^-,  wjy),  j  =  1, . . . ,  M),  the  Monte  Carlo 
apprarimant  (7)  is  easily  computed.  As  mentioned  in  Section  3,  our  algorithm  uses  a  small  number 
of  iterations  to  update  this  0o  value  before  the  Anal  maximization.  Our  program  was  written  in 
FORTRAN  and  called  the  IMSL  routine  DBCONF,  a  quasi-Newton  algorithm  employing  a  finite 
difference  gradient,  to  perform  the  necessary  maximizations. 

Using  Af  =  10, 000  replications  on  the  third  and  final  iteration  of  the  algorithm  we  obtained  the 
fuU  model  maximum  likelihood  estimate  6  =  (—0.008, 1.073, 1.706,0.786).  A  numericaUy  computed 
Hessian  produced  the  asymptotic  standard  deviation  vector  (2.001,  0.068,  0.220,  0.091);  the  esti¬ 
mated  correlations  between  the  elements  of  &  were  all  negligible  with  the  exception  of  that  between 
&  and  -0.85.  Fitting  the  simple  trended  AR(1)  model  to  the  entire  set  of  n  =  50  observations 
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results  in  the  point  estimate  (0.208,  1.068,  1.632,  0.819)  and  associated  standard  deviation  vector 
(2.165, 0.072, 0.168,  0.078).  The  estimation  that  uses  record  data  only  appears  to  be  degraded  very 
little,  despite  the  loss  of  nearly  one  third  of  the  observations. 

Repeating  the  algorithm  using  the  same  number  of  iterations  and  replications  for  the  reduced 
model  (p  =  0)  gave  a  =  106, 4  =  1.055,  and  a  =  3.043.  Figure  1(a)  shows  that  the  trend  lines 

obtained  from  the  full  and  reduced  models  are  virtually  indistinguishable.  Still,  the  likelihood 
ratio  statistic  equals  36.76  on  1  degree  of  freedom,  and  this  in  turns  leads  to  an  Akmke  information 
criterion  of  35.76  and  a  Bayesian  information  (Schwarz)  criterion  of  33.23  -  the  latter  suggesting  a 
Bayes  factor  in  favor  of  the  full  model  of  over  16  million! 

Clearly  all  of  the  above  model  choice  criteria  confirm  that  the  full  model  is  vastly  superior,  but 
apparently  its  value  lies  primarily  in  its  increased  precision,  indicated  by  its  much  smaller  a  value. 
Since  this  added  precision  should  translate  into  better  predictive  ability,  we  decided  to  investigate 
farther  using  a  bootstrap  approach.  We  drew  yloj, . . . ,  yloj  from  the  fitted  full  and  reduced  models 
for  j  =  1, . . . ,  2000,  being  careful  to  constrain  y^Qj  to  be  less  than  ^49,  as  was  observed  in  the  original 
dataset.  Figure  2(a)  then  plots  the  5*^,  50^  and  95*^  percentiles  of  the  bootstrap  distribution  over 
time.  We  see  that  the  full  model  is  indeed  more  precise,  though  its  advantage  gradually  diminishes. 
The  full  model  expects  slightly  larger  future  observations  on  the  average  due  to  the  recent  history 
of  records  at  937  through  P49.  The  compression  of  the  95^  percentile  for  y^gj  is  apparently  due  to 
the  restriction  that  it  not  exceed  2/49  -  an  extra  bit  of  information  not  usually  available  in  a  truly 
predictive  setting. 

Finally,  Figure  2(b)  gives  the  histogram  of  the  bootstrap  distribution  of  the  wmting  time  until 
the  first  record  after  ^49  under  the  full  model.  Counting  the  known  non-record  value  at  t  =  50,  the 
model  suggests  substantial  probabilities  of  gaps  of  length  4  or  5,  and  a  nonnegligible  chance  of  a 
gap  as  long  as  10  time  points.  Again,  this  behavior  is  understandable  in  light  of  the  high  estimated 
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autocorrelation  combined  with  the  long  string  of  records  recently  observed  to  have  ended;  indeed, 
such  gaps  have  occurred  in  our  sample. 


5.2  Olympic  high  jump  data 


As  a  second  illustration,  consider  the  data  displayed  in  Table  2.  These  are  the  record  breaking 
Ol3rmpic  high  jumps  since  1896,  as  presented  in  the  World  Almanac  and  Book  of  Facts  (1989). 
Besides  being  a  prototype  for  many  sports  history  datasets  of  this  type,  this  dataset  is  interesting 
because  it  contains  two  distinct  types  of  missing  data.  First,  no  record  breaking  high  jump  occurred 
at  the  Olympics  in  the  years  1904,  1920,  1928,  1932,  1948,  1972,  and  1984.  Second,  no  record 
occurred  in  the  years  1916, 1940,  and  1944  because  the  Olympics  themselves  were  cancelled  due  to 
the  intervening  world  wars.  Our  likelihood  must  reflect  this  distinction  between  the  failures  and 
the  cancellations. 
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Sj 

year 

record  (in.) 

athlete  (country) 

1 

1 

1896 

71.25 

Ellery  Clark  (US) 

2 

2 

1900 

74.80 

Lrwing  Baxter  (US) 

3 

4 

1908 

75.00 

Harry  Porter  (US) 

4 

5 

1912 

76.00 

Alma  Richards  (US) 

5 

8 

1924 

78.00 

Harold  Osborn  (US) 

6 

11 

1936 

80.00 

Cornelius  Johnson  (US) 

7 

15 

1952 

80.32 

Walter  Davis  (US) 

8 

16 

1956 

83.50 

Charles  Dumas  (US) 

9 

17 

1960 

85.00 

Robert  Shavlakadze  (USSR) 

ID 

18 

1964 

85.75 

Valery  Brumel  (USSR) 

11 

19 

1968 

88.25 

Dick  Fosbury  (US) 

12 

21 

1976 

88.50 

Jacek  Wszoia  (Poland) 

13 

22 

1980 

92.75 

Gerd  Wessig  (E.  Germany) 

14 

24 

1988 

93.50 

Guennadi  Avdeenko  (USSR) 

I^ble  2:  Olympic  High  Jump  Records,  1896-1988 


Since  world-class  athletes  typically  compete  in  more  than  one  year's  Olympics,  and  perhaps 


since  the  athletes  of  a  few  countries  seem  to  dominate  this  event,  we  might  wish  to  fit  a  model 
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for  dependent  data.  With  r  =  14  broken  records  in  only  21  Olympie  attempts,  there  is  also  a 
clear  need  to  model  an  increasing  mean  over  time.  Because  this  is  agmn  a  discrete  time  dataset, 
we  shall  attempt  to  fit  the  normal  linear  AR(1)  model  presented  in  Subsection  4.4.  For  simplicity 
we  let  /ii  =  Q  +  =  Q  +  ^{year  -  1892)/4,  and  investigate  the  distribution  of  the  waiting  time 

until  the  next  record.  Basak  and  Bagchi  (1990)  used  Laplace’s  method  to  estimate  the  predictive 
distribution  of  the  magnitude  of  the  next  record  given  the  14  records  in  this  dataset,  but  ht  only 
a  simple  model  that  assumed  uncorrelated  observations  having  a  constant  mean  over  time.  Their 
analysis  also  ignored  the  failures  and  cancellations,  and  thus  discarded  the  information  carried  by 
the  failed  attempts.  Under  this  model  the  cancellations  may  be  ignored,  since  they  cannot  be 
connected  to  any  observed  data  and  thus  do  not  affect  the  likelihood. 

Writing  w  =  (ygjj/io)'  «o  that  v  =  (ys.yao.ysaiyr.yis.w')',  the  likelihood  is  given  by 

m  no)  =  fivu  »)  {n;=.  /(».,!».,-. ;«)}  {n,en  /(wlw-t. »«;  »)*/} 

where  Fi  =  {3, 20, 23}.  All  of  the  distributions  in  this  expression  are  av^able  as  univariate  normals, 
except  for  the  bivariate  normal  distribution  of  w. 

Turning  to  our  sampling-based  implementation,  we  again  work  with  expression  (7),  which  re¬ 
quires  simulated  y*j  values  for  the  seven  observed  failures.  Except  for  the  back-to-back  f^ures  in 
1928  and  1932,  all  of  these  represent  gaps  of  length  1  and  thus  may  be  generated  without  the  use  of 
Gibbs  sampling  from  complete  conditional  distributions  similar  to  those  displayed  in  equation  (10). 
Of  course  when  one  of  these  fulures  abuts  a  cancellation  (as  in  1920  and  1948),  there  are  slight 
modifications  to  the  mean  and  variance  to  reflect  the  fact  that  the  adjacent  record  is  more  than  one 
position  away,  but  the  associated  conditional  normal  calculations  are  still  routine.  For  the  single 
gap  of  length  two,  we  used  the  Gibbs  sampler  with  W  =  20  to  obtain  y^j  and  yJ^j  values  generated 
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from  complete  conditionals  of  the  form  given  in  equation  (11).  No  modifications  are  needed  in  this 
case  as  both  ys  and  yn  are  observed  records. 

Once  in  possession  of  the  sampled  values  {v^  =  {y3j,y20j,yhj,y7j>yuj>'^j)>  j  = 
we  may  evaluate  the  Monte  Carlo  approximant  to  the  likelihood  (7),  which  is  again  routine 
using  multivariate  normal  theory.  Generating  M  =■  10,000  replications  at  the  final  iteration, 
this  dataset  required  only  N  =  Z  iterations  to  produce  the  maximum  likelihood  estimate  6  = 
(70.037,0.894,1.697,0.334),  with  associated  asymptotic  standard  deviation  vector  (1.384,  0.084, 
0.368,  0.372).  Appealing  to  the  asymptotic  normality  of  the  maximum  likelihood  estimator,  the 
data  strongly  suggest  an  increase  of  neariy  1  inch  in  the  best  Olympic  high  jump  every  four  years, 
but  offer  only  mild  evidence  of  a  positive  correlation  amongst  these  quadrennial  performances.  Re¬ 
peating  the  calculations  for  the  reduced  model  having  p  =  0,  we  obtained  d  =  70.072,  $  =  0.881, 
and  d  =  1.869,  estimates  which  show  little  movement  from  those  in  the  full  model.  The  single 
degree  of  freedom  x’  likelihood  ratio  statistic  between  these  two  models  is  1.91,  implying  a  p- value 
of  0.168.  The  Akaike  criterion  is  0.91  (a  slight  preference  for  the  full  model)  while  the  Schwarz 
criterion  is  -0.73,  for  an  apprommate  Bayes  factor  of  0  694  =  1/1.44  (a  slight  preference  for  the 
reduced  model).  Thus  the  data  are  incondusive  on  the  issue  of  whether  to  indude  p  in  the  modd 
or  not. 

Figure  3(a)  addresses  the  prediction  question,  again  through  a  bootstrapping  approach.  While 
the  fitted  full  modd  has  been  used  in  this  analysis,  the  near  linearity  of  all  three  lines  in  this  plot 
indicates  only  a  small  gain  in  predictive  predsion  over  what  might  be  expected  from  the  simple 
nncorrdated  modd.  Finally,  Figure  3(b)  shows  a  nearly  linear  decline  in  the  probabilities  of  the 
waiting  time  distribution,  with  a  new  record  almost  certain  to  have  occurred  within  the  next  five 
Olympic  meets.  The  37%  predicted  chance  of  a  record-breaking  high  jump  at  the  1992  Olympics 
seems  consistent  with  the  overall  observed  proportion  of  records  in  back-to-back  competitions. 
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Simulated  high  correlation  record  breaking  dataset 


c)  Lagged  residual  plot;  corr  -  0.83  d)  I  agged  difference  pk>t;  corr  -  -0.05 ;  rtto  hat  -  0.901 


Figure  2.  Prediction  for  Simulated  High  Correlation  Data 


2000  bootstrap  reps 


a)  5th,  50th,  and  95th  percentiles,  boostrap  distribution  of  future  observations 
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b)  Bootstrap  distribution  of  waiting  time  until  next  record 
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Sumnuury 

In  this  paper  we  consider  the  analysis  of  record  breaking  datasets,  where  only  observations 
that  exceed  (or  only  those  that  fall  below)  the  cu.Tent  extreme  value  are  recorded.  jSxamples  of 
application  areas  leading  to  data  of  this  t}rpe  include  industrial  stress  testing,  meteorological  anal¬ 
ysis,  sporting  and  athletic  events,  and  oil  and  mining  surveys.  The  inherent  missing  data  structure 
present  in  such  problems  leads  to  likelihood  functions  that  contain  possibly  high-dimensional  inte¬ 
grals,  thus  rendering  traditional  maximum  likelihood  methods  difficult  or  infeasible.  Fortunately, 
we  may  obtain  arbitrarily  accurate  approximations  to  the  likelihood  function  by  iteratively  apply¬ 
ing  Monte  Carlo  integration  methods  (Geyer  and  Thompson,  1992).  Subiteration  using  the  Gibbs 
sampler  may  help  to  evaluate  any  multivariate  integrals  encountered  during  this  process.  This  ap¬ 
proach  enables  a  far  more  sophisticated  set  of  parametric  models  than  have  been  applied  previously 
in  record  breaking  contexts.  In  particular,  we  illustrate  the  methodology  for  a  wide  array  of  discrete 
and  continuous  distributional  settings,  and  for  observations  that  may  be  correlated  and  subject  to 
mean  shifts  over  time.  Related  issues  in  model  selection  and  prediction  are  also  addressed.  Finally, 
we  present  two  numerical  examples.  The  iirst  uses  a  generated  datuet  exhibiting  a  high  degree  of 
autocorrelation,  while  the  second  involves  records  in  Olympic  high  jump  competition. 


